Skip to content

spec : fix draft model checkpoints#22521

Merged
ggerganov merged 3 commits intomasterfrom
gg/spec-fix-draft-checkpoints
Apr 30, 2026
Merged

spec : fix draft model checkpoints#22521
ggerganov merged 3 commits intomasterfrom
gg/spec-fix-draft-checkpoints

Conversation

@ggerganov
Copy link
Copy Markdown
Member

Overview

cont #19493

Improve the logic for when to create and restore the draft model checkpoints. The old logic was discarding the checkpoint on every new completion request resulting in long draft model recompute during large agentic session.

Requirements

@ggerganov ggerganov marked this pull request as ready for review April 29, 2026 17:50
@ggerganov ggerganov requested review from a team as code owners April 29, 2026 17:50
@ggerganov ggerganov merged commit 80afa33 into master Apr 30, 2026
45 of 46 checks passed
@ggerganov ggerganov deleted the gg/spec-fix-draft-checkpoints branch April 30, 2026 05:32
tekintian added a commit to tekintian/llama.cpp that referenced this pull request May 1, 2026
* 'master' of github.com:tekintian/llama.cpp: (659 commits)
  ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_ID (ggml-org#22464)
  Update llama-mmap to use ftello/fseeko (ggml-org#22497)
  common : check for null getpwuid in hf-cache (ggml-org#22550)
  vulkan: add get/set tensor 2d functions (ggml-org#22514)
  spec: fix argument typo (ggml-org#22552)
  ci : bump ty to 0.0.33 (ggml-org#22535)
  vendor : update cpp-httplib to 0.43.2 (ggml-org#22548)
  CUDA: fix tile FA kernel on Pascal (ggml-org#22541)
  scripts : add wc2wt.sh - create worktree from current HEAD (ggml-org#22513)
  add fast matmul iquants (ggml-org#22504)
  spec : fix draft model checkpoints (ggml-org#22521)
  spec : fix vocab compat checks in spec example (ggml-org#22426)
  common : do not pass prompt tokens to reasoning budget sampler (ggml-org#22488)
  hexagon: make vmem and buffer-size configurable (ggml-org#22487)
  CUDA: fuse SSM_CONV + ADD(bias) + SILU (ggml-org#22478)
  spec : disacard last drafted token with low prob (ggml-org#22506)
  sync : ggml
  ggml : bump version to 0.10.1 (ggml/1469)
  webui: fix slow mic stop and WAV encode (ggml-org#22480)
  ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault (ggml-org#22293)
  ...

# Conflicts:
#	.gitignore
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
* spec : fix draft model checkpoints

* cont : clean-up

* cont : gate the ngram-mod reset warning behind verbose flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant